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CPUs and operating systems are moving from 32 to 64 bits, and hence it is important to have good pseu¬ 
dorandom number generators designed to fully exploit these word lengths. However, existing 64-hit very 
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equidistribution properties. Here we develop 64-bit maximally equidistributed pseudorandom number gen¬ 
erators that are optimal in this respect and have speeds equivalent to 64-bit Mersenne Twisters. We provide 
a table of specific parameters with period lengths from 2®^^ — 1 to — 1. 

Categories and Subject Descriptors: G.4 [Mathematical Software]: Algorithm design and analysis; 1.6 
[Computing Methodologies]: Simulation and Modeling 

General Terms: Design, Algorithms, Performance 

Additional Key Words and Phrases: Random number generator, Mersenne Twister, Equidistribution, Em¬ 
pirical statistical testing 

ACM Reference Format: 

Shin Harase and Takamitsu Kimoto. 2017. Implementing 64-bit maximally equidistributed F 2 -hnear gen¬ 
erators with Mersenne prime period. ACM Trans. Math. Softw. V, N, Article A (January YYYY), 11 pages. 
DDI:http://dx.doi.org/10.1145/0000000.0000000 

1. INTRODUCTION 

Monte Carlo simulations are a basic tool in financial engineering, computa¬ 
tional physics, statistics, and other fields. To obtain precise simulation re¬ 
sults, the quality of pseudorandom number generators is important. At present, 
the 32-bit Mersenne Twister (MT) generator MT19937 (with period — 1) 

iMatsumoto and Nishimura 1998ll is one of the most widely used pseudorandom num¬ 
ber generators. However, modern CPUs and operating systems are moving from 32 to 
64 bits, and hence it is important to have high-quality generators designed to fully 
exploit 64-bit words. 

Many pseudorandom number generators, including Mersenne Twisters, are based 
on linear recurrences modulo 2; these are called ¥ 2 -linear generators. One advan¬ 
tage of these generators is that they can be assessed by means of the dimension of 
equidistribution with v-bit accuracy, which is a most informative criterion for high 
dimensional uniformity of the output sequences. In fact, MT19937 is not completely 
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optimized in this respect. IPanneton et al. [20061 developed the Well Equidistrihuted 
Long-period Linear (WELL) generators with periods from 2®^^ — 1 to — 1, which 
are completely optimized for this criterion (called maximally equidistrihuted), hut 
the parameter sets were only searched for the c ase of 32-hit gener ators. Conversely, 
there exist several 64-hit F 2 -linear generators. INishimura [200()] | developed 64-bit 
Mers enne Twisters, and the SIM D-oriented Fast Mersenne Twister (SFMT) gener¬ 
ator IlSaito and Matsumoto 200^ has a function to generate 64-bit unsigned inte- 
gers. For graphics processing units, Mersenne Twister for Graphic Processors (MTGP) 
ISaito and Matsumoto 20131 is also a good candi date. However, t hese generators are 
not maximally equidistrihuted. In earlier work, L’Ecuyer [1999) searched for 64-bit 
maximally equidistrihuted combined Tausworthe generators (with some additional 
properties). At present, though, to the best of our knowledge, there exists no 64-bit 
maximally equidistrihuted MT-type F 2 -linear generator with very long period —1, 
such as a 64-bit variant of the WELL generators. 

The aim of this article is to develop 64-bit maximally equidis trihuted F 2 -linear gen¬ 
erators with similar speed as the 64-bit Mersenne Twisters INishimura 2000l. The 
key techniques are (i) state transitions with double feedbacks IPanneton et al. 20061 
ISaito and Matsu moto 20 091 a nd (ii) linear output transformations with several mem¬ 
ory references IHarase 20091 . We provide a table of specific parameters with periods 
from 2®®^ — 1 to 2^^^®^ — 1. The design of our generators is base d on a combination of 
existing techniques, such as the WELL and dSFMT generators IPanneton et al. 20061 
ISaito and Matsumoto 200^ . but we select state transitions differently from those of 
the original WELL generators to maintain the generation speed. We refer to these 
generators as 64-bit Maximally Equidistrihuted ¥ 2 -Linear Generators (MELGs) with 
Mersenne prime period. 

In practice, we often convert unsigned integers into 53-bit double-precision floating¬ 
point numbers in [0,1) in IEEE 754 format. Our 64-bit generators are useful for this. 
To generate 64-bit output values, one can either use a pseudorandom number gener¬ 
ator whose linear recurrence is implemented with 32-bit integers and then take two 
successive 32-bit blocks or, instead, use a pseudorandom number generator whose re¬ 
currence is implemented directly over 64-bit integers. As described below, the former 
method may degrade the dimension of equidistribution with ti-bit accuracy, compared 
with simply using 32-bit output values. We consider the case of the 32-bit MT19937 
generator in Section |4l For this reason, we develop 64-bit MELGs to directly generate 
64-bit unsigned integers. 

The article is organized as follows. In the next section, we review F 2 -linear gener¬ 
ators and their theoretical criteria. We also summarize the framework of Mersenne 
Twisters. Section[3]is devoted to our main result: 64-bit MELGs. In SectionlH we com¬ 
pare our generators with others in terms of speeds, theoretical criteria, and empirical 
statistical tests. Section [5] concludes. 


2. PRELIMINARIES 
2.1. F 2 -Linear Generators 


We recall the notatio n of F 2 -linear generators; see | L’Ecuyer and Panneton 2009 
IMatsumoto et al. 20061 for details. Let F 2 := {0,1} be the two-element field, i.e., ad- 
dition and multiplication are performed modulo 2. We consider the following class of 
generators. 


Definition 2.1 {¥ 2 -lineargenerator). Let S := F^ be a p-dimensional state space (of 
the possible states of the memory assigned for generators). Let f : S ^ S he an F 2 - 
linear state transition function. Let O := F^ be the set of outputs, where w is the word 
size of the intended machine, and let o : S' —O an F 2 -linear output function. For an 
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initial state sq G S, at every time step, the state is changed hy the recursion 

s,+i = /(s,) (z = 0,1,2,...), (1) 

and the output sequence is given hy 

o(so),o(si),o(s 2 ),... G O. (2) 

We identify O as a set of unsigned tc-hit binary integers. A generator with these prop¬ 
erties is called an ¥ 2 -linear generator. 

Let P{z) he the characteristic polynomial of /. The recurrence ([11) has a period of 
length 2^* — 1 (i ts maximal possib l e value) if an d only if P{z) is a primitive polyno¬ 
mial modulo 2 IINiederreiter 19^IKnuth 199711 . When this value is reached, we say 
that the F 2 -linear generator has maximal period. Unless otherwise noted, we assume 
throughout that this condition holds. 


2.2. Quality Criteria 

Following I L’Ecuyer and Panneton 20091 IMatsumoto st 20061 . we recall two quality 
criteria for F 2 -linear generators. A most informative criterion for high dimensional 
uniformity is the dimension of equidistribution with v-bit accuracy. Assume that an 
F 2 -linear generator has the maximal period 2 ^ — 1. We identify the output set (3 := F“ 
as a set of unsigned w-bit binary integers. We focus on the v most significant bits of 
the output, and regard these bits as the output with v-bit accuracy. This amounts to 
considering the composition : S' A F^ —F 2 , where the latter mapping denotes 
taking the v most significant bits. We define the fe-tuple output function as 

: S ^ (F^)^ so ^ K(so), o„(/(so)), ■. ■, o,{f-\s^))). 


Thus, o^v'^ (sq) is the vector formed by the v most significant bits of k consecutive output 
values of the pseudorandom number generators from a state sq. 


Definition 2.2 {Dimension of equidistribution with v-bit accuracy). The generator 
is said to be fc-dimensionally equidistributed with f-bit accuracy if and only if : 
S (F^)^ is surjective. The largest value of k with this property is called the dimen¬ 
sion of equidistribution with v-bit accuracy, denoted by k{v). 


Because is F 2 -linear, fc-dimensional equidistribution with ti-bit accuracy means 
that every element in (F^)^ occurs with the same probability, when the initial state 
So is uniformly distributed over the state space S. As a criterion o f uniformity, larger 
values of k{v) for each 1 < z; < w are desirable ITootill et al. 19731 . We have a trivial 
upper bound k{v) < \j)/v\. The gap d{v) := \p/v\ — k{v) is called the dimension defect at 
V, and the sum of the gaps A := i® called the total dimension defect. If A = 0, 

the generator is said to be maximally equidistributed. For F 2 -linear generators, one 
can quickly compute k{v) for v = 1,..., w by using lattice reduction algorithms over 
formal power series fields IHarase et al. 20 111IHarase 2011L These are closely related 
to the lattice reduction algorithm originally proposed by Couture and L’Ecuyer [20()0) . 

As anothe r criterion, the number Ai of nonzero coeffi cients for P{z) should be 
close to p/2 I Compagner 199 Ij Wang and Compagner 19931. For exa mple, generators 
for which P(z) is a trinomial or pentanomial tail in statistical tests BLindholm 19681 
IMatsumoto and Kurita 19961 IMatsumoto and Nishimura 200211 . If Ni is not large 
enough, the generator suffers a long-lasting impact for poor i nitialization known a s 
a 0-excess state sq g S, which contains only a few bits set to 1 IPanneton et al. 20061 . 
Thus, Ni should be in the vicinity of p/2. 
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2.3. Mersenne Twisters 

In general, to ensure a maximal period, testing the primitivity of P{z) is the bottleneck 
in searching for long-period generators (because the factorization of 2 ^* — 1 is required). 
If p is a Mersenne exponent (i.e., 2^ — 1 is a Mer senne prime), one can instead use 
an irreducibility test that is easier and equivalent. IMatsumoto and Nishimura [19981 
developed a pseudorandom number generator with a Mersenne prime period 2 ^ — 1 
known as the Mersenne Twister. We briefly review their method. The state space S 
and state transition f : S ^ S are expressed as 

(w““’',Wj+i, Wj+ 2 , . . . , Wi+Ar_i) (w^Y,Wj+ 2 , Wi+ 3 , . . . , Wj+at), (3) 

where w^ e F™ is a w-bit word vector, w™“’' G denotes the w — r most significant 
bits of Wi, N := \p/w], and r is a non-negative integer such that p = Nw — r, so that 
the p bits of S are stored in an array oi Nw bits in which there are r unused bits, all 
set to zero. The state transition (l3]l is implemented as 

Wi+N ■■= Wi+Af © (wf’' I w[+i)A, (4) 

where G F 2 represents the r least significant bits of w^+i, © is the bitwise 

exclusive-OR (i.e., addition in F^), and (w““'' | w[^_j), a w-bit vector, is the concate¬ 
nation of the {w — r)-bit vector w™“'’ and the r-bit vector w[^j in that order. M is an 
integer such that 0 < M < N — 1, and A G F^^®" is a (w x w)-regular matrix (with the 
format in ([Hi below). Furthermore, to improve k{v), for the right-hand side of (HJ), the 
output transformation o : S ^ O is implemented as 

Wj+ 2 ,. . ., Wi+Af) G 5' 1 -^ Wj+atT G O, (5) 

where T G F^^*" is a suitable {w X w)-regular matrix. This technique is called temper¬ 
ing. From this, we obtain an output sequence watT, wat+iT, WAf+ 2 'T, ■ ■ ■ G O by mul¬ 
tiplying a matrix T by the sequence from (S]). IMatsumoto and Nishimura [19981 and 
[Nishimura [2000) searched for 32- and 64-bit Mersenne Twisters with p eriod length 
219937 _ 2 ^^ respectively. In Appendix B of IMatsumoto and Nishimura 199^ . it is proved 
that these g enerators cannot attain the maxim al equidistribution (e.g., A = 6750 for 
MT19937 in IMatsumoto and Nishimura 19981 ). In fact, the state transition in ((S)-® 
is very simple, but the linear output transformation T in I® is rather complicated (see 
IMatsumoto and Nishimura 19981 INishimura 200 ()]| for details). 

3. MAIN RESULT: 64-BIT MAXIMALLY EQUIDISTRIBUTED F 2 -LINEAR GENERATORS 
3.1. Design 

To obtain maximally equidistributed generators without loss of speed, we try to 
shift the balance of costs in / and o. The key technique is a suitable choice 
of (i) state transitions wi th double feedbacks proposed in IPanneton et al. 20061 
ISaito and Matsumoto 200 ^ and (i i) linear output transformations with several mem¬ 
ory references from BHarase 20091 . Let N = \p/w'\. We divide S' = F^ into two parts 
S = F^”™ X F^ and consider the state transition f : S S with 

(w™”’’, Wi+i,Wj+ 2 , . . . , Wj+Ar- 2 ,Vi) (w^^^ ^^+ 2 , Wi+ 3 , . . . , W^+at-I, V^+i), ( 6 ) 

where Wj+at-i and v^+i are determined by the recursions 0 and (Ull, as described be¬ 
low. The first p—w bits are stored in an array in which r bits are unused and set to zero, 
as for the original MTs. Note that the number of words is A — 1 , not N. The remaining 
word Vj is expected to be stored in a register of the CPU and updated as v^+i at the 
next step, so that the implementation requires only a single word (see Figure 1 and 
Algorithm 1 in this subsection). The use of the extra state variable v^ was originally 
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Fig. 1. Circuit-like description of the proposed generators. 


proposed by lPanneton et al. [20061 and refi ned by lSaito and Matsumoto [2009| |. (Note 
that Vi 0 in Fig. 1 of llPanneton et al. 200611 corresponds to this yariable Vj.) This ap¬ 
proach is a key techni que for drastically improying Ni and A. By refining the recursion 
formulas proposed in fSaito and Matsumoto 200^ . we implement the state transition 
/ with the following recursions: 



v.j+1 := (w““’' 1 w[^ JA © W.J+M © v.jB, 

(7) 


Wj+Ar_i := (wf’' 1 W[^;i) © Vi+iC. 

(8) 

A, B, and C are ( 

w X u>)-matrices defined indirectly as follows: 



[(w>l)©a Aww-i = l, 

(9) 


MvB ■= W©(w<(Ti), 

(10) 


wC := w©(w»(j 2 ), 

(11) 

where w = [wq, ... ,Ww-i) G F 2 and a G F“ are w-bit yectors, cti and a 2 are integers 
with 0 < (Ti,tT 2 < w, and “w ^ 1” and “w > /’’denote left and right logical (i.e., zero- 
padded) shifts by 1 bits, respectiyely. Note that is one component in a state Si € S 


and is not output. 

To attain the maximal equidistrihution exactly, we design a li near output tr ansfor- 
mation using another word in the state array, which comes from BHarase 2009L More 
precisely, for the right-hand side of ([61), we consider the following linear output trans¬ 
formation o : S ^ O with one more memory reference: 

(w[“;^“',Wj+ 2 ,..., Wi+Ar_i, Vj+i) G 5' 1-^ Wj+at-iTi © Wi+LT2 G O. (12) 

Here Ti and T 2 are {w x w)-matrices defined by 

wTi := w©(w<Ccr3), (13) 

WT 2 := (w & b), (14) 

where L is an integer with 0 < L < A — 2, 0-3 is an integer with 0 < 0-3 < w, & denotes 
bitwise AND, and b G is a u>-bit yector. A circuit-like description of the proposed 
generators is shown in Figure [H 

An equiyalent formal algorithm can be described as Algorithm 1, in which, instead 
of shifting, we use a round-robin technique (i.e., a pointer technique) to improye the ef¬ 
ficiency of the generation. Let w[0.. A — 2] be an array of A— 1 unsigned integers with w 
bits, and v be a u>-bit unsigned integer that corresponds to the extra state yariable . 
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Let X be a temporary variable and y be an output variable that are w-bit unsigned 
integers, respectively. Set ^ (1, ..., 1,0,..., 0) and rh’' v- (0,..., 0,1,..., 1). 

w—r r w—r r 

Here is a bit mask that retains the first w — r bits and sets the other r bits 

to zero, whereas m’’ is its bitwise complement. Before Algorithm 1 begins, we initialize 
(w[0] & m’""’’), w[l],..., w[A — 2], V ^ initial values, not all zero, and set the pointer 
i ^ 0. For more detailed descriptions of the initialization, see Remark l3.ll 


ALGORITHM 1: The algorithm of the proposed generators 

Input : w[0], w[l],..., w[A — 2], v, and the pointer i 
Output: w-bit output y 

Set X •<— (w[i] & m’"”’’) © (w[(i + 1) mod {N — 1)] & m’'). // compute (w““^ | w[_,_j). 
Set V ■«— xA © w[(i + M) mod {N — 1)] © vA. // compute Eq. 0 . 

Set w[i] X © v( 7 . // compute Eq. ((S). 

Set y ^ w[i]ri © w[(i + L) mod {N — 1 )]T 2 . II compute Eq. ( fl^ . 

Increment i <— i + 1 . If i > N — 1 , then i <— i mod {N — 1). // increment the pointer i. 
Return y 


3.2. Specific Parameters 

We search for specific parameters in the following way. First, we look for M, ui, 0-2, and 
a £ in (F7l)-( I11D at random such that / attains the maximal period 2^ — 1. In general, 
because we can obtain several parameters, we choose a parameter set whose Ni is large 
enough and whose output has large k{v) for w = 1, ..., ir as far as possible in the case 
where we set y w[i] in Algorithm 1 (i.e., Ti and T2 are the identity and zero matrices, 
respectively). In the next step, we search for L, 0-3, and b £ in ( I12M 14II at random 
such that the generator is “almost” maximally equidistributed (i.e., A is almost 0). Fi¬ 
nally, to obtain A = 0 strictly, we app ly a slight mo dification to the bit mask b £ F|' by 
using the backtracking algorithm in IHarase 200911 (with some trial and error). Table [I] 
lists specific parameters for 64-bit maximally equidistributed generators with periods 
ranging from 2®°^ — 1 to — 1. We attach the acronym 64-bit MELGs, for 64-bit 

“maximally equidistributed F2 -linear generators” with Mersenne prime period, to the 
proposed generators. The code in C is available at https://github.com/sharase/melg-64. 

In parallel computing, an important requirement is the availability of pseudorandom 
number generators with disjoint streams. These are usually implemented by partition¬ 
ing the output sequences of a long-period generator into long disjoint subsequences 
whose starti ng points are found by m aking large jumps in the original sequences. For 
this purpose, IHaramoto et al. [20081 proposed a fast jumping-ahead algorithm for F 2 - 
linear generators. We also implemented this algorithm for our 64-bit MELGs. The code 
is also available at the above website. The default skip size is 2^®®. 

For our generators, we implemented a function to produce double-precision floating¬ 
point nu mbers up, ui, U 2 , ■. ■ in [0,1) i n IEEE 754 format by using the method of Sec¬ 
tion 2 of ISaito and Matsumoto 200^ (i.e., the 12 bits for sign and exponent are kept 
constant, and the 52 bits of the significand are taken from the generator output). We 
note that this function is preferable from the viewpoint of k(v) because it does not 
introduce approximation errors by division. 

Remark 3.1. [Matsumoto et al. [20071 reported that many pseudorandom number 
generators have some nonrandom bit patterns when initial seeds are systematically 
chosen, especially when the initialization scheme is based on a linear congruential 
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Table I. Specific Parameters of 64-bit Maximally Equidistributed F 2 - 
Linear Generators (MELGs) 


M 

L 

(71 (72 

<^3 

a 

b 

Afi 

A 

p = 607, w = 64,r = 33,N = 10 

MJiLG607-64 5 

13 35 

81flfd68012348bc 

313 

3 

30 

66edc62a6bf8c826 

0 

p = 1279, w = 64, r = ; 

L,fV = 20 



MELG1279-64 7~ 

22 37 

Iafefdl526d3952b 

641 

5 

6 

3a23d78e8fb5e349 

0 

p = 2281, U) = 64, r = 23, Af = 36 

MELG2281-64 17 

36 21 

7 cbe23ebca8a6d36 

1145 

6 

6 

e4e2242b6el5aebe 

0 

p = 4253, w = 6i,r = 35,N = 67 

MELG4253-64 29 

30 20 

facle8c56471d722 

2129 

9 

5 

Cb67b0cl8fel4f4d 

0 

p = 11213, w = 64, r = 

51,iV = 176 



MELG11213-64 45" 

33 13 

ddbcd6e525elc757 

5455 

4 

5 

bd2dl251e589593f 

0 

p = 19937, w = 64, r = 

31,Ar = 312 



MELG19937-64 ^ 

23 33 

5c32e06df730fc42 

9603 

19 

16 

6aede6fd97b338ec 

0 

p = 44497, w = 64, r = 

47, N = 696 



MELG44497-64 373 

37 14 

4fa9ca36f293c9a9 

19475 

95 

6 

6fbbee29aaefd91 

0 


generator. To avoid such phenomena, the 2002 version of MT199 37 adopted a non¬ 
linear initializer described in Eq. (30) of iMatsumoto et al. 200711 . For our proposed 
generators, we implemented a similar initialization scheme. Let wg S be an initial 
seed. To obtain an initial state sq = (wg wi, W 2 ,..., WAr_ 2 , vg) e S, we set Wg as 
the w — r most significant bits of wg and compute 

Wj -f- a X (wi_i 0 (wi_i > 0 - 4 )) + i (mod 2“) 
for i = 1, ..., N — 2 and 

Vg -f- a X (wAr _2 © (wiv -2 > 0 - 4 )) + - 1 (mod 2’"), 

using the usual integer arithmetic, where a = 63 64136223846793005 (in decimal nota¬ 
tion) is a multiplier recommended in BKnuth 19971 pp. 104], g 4 = 62, and w = 64. These 
parameters are from the 2004 version of 64-bit MT UNishimura 2000l mentioned in 
Section 131 In a similar manner, we implemented an initializer whose seed is an array 
of integers of arbitrary length. 

Remark 3.2. As pointed out in Section 3.8 of IlSaito and Matsumoto 201^ . it might 
be difficult to run the proposed generators in a multi-threaded environment, such as 
GPUs. This is because our generators have the heavy dependencies on the partial com¬ 
putation of the extra s tate variable Vj in the recursion Q. Thus, in this c ase, it seems 
that the original MTs IMatsumoto and Nishimura 19981 INishimura 200011 are suitable 
because of the si mplicity of recursion. We not e that MTGP is designed specifically for 
this purpose (see ISaito and Matsumoto 2013ll for details). 

4. PERFORMANCE 

We compare the following F 2 -linear generators corresponding to 64-bit integer output 
sequences: 

— MELG19937-64: the 64-bit integer output of our proposed generator; 
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Table II. Figures of Merit Ni and A and CPU Time (in Seconds) Taken to Generate 10® 64-bit 
Unsigned Integers 


Generators 

Ni 

A 

CPU time (Intel) 

CPU time (AMD) 

MELG19937-64 

9603 

0 

4.2123 

6.2920 

MT19937-64 

285 

7820 

5.1002 

6.6490 

MT19937-64 (IDS) 

5795 

7940 

4.8993 

6.7930 

SFMT19937-64 (without SIMD) 

6711 

14095 

4.2654 

5.6123 

SFMT19937-64 (with SIMD) 

6711 

14095 

1.8457 

2.8806 


— MT19937-64: the 64-bit integer output of the 64-bit Mersenne Twister (downloaded 
from http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html); 

— MT19937-64 (ID3): the 64-bit integer output of a 64-bit Mersenne Twister based on 
a five-term recursion (ID3) UNishimura 2000l : 

— SFMT19937-64 (without SIMD): the 64-bit inte ger output of the SIMD-orie nted Fast 
Mersenne Twister SFMT19937 without SIMD fSaito and Matsumoto 20081 : 

— SFMT19937-64 (with SIMD ): the 64-bit integer output of the foregoing with SIMD 
IlSaito and Matsumoto 20081 . 


The first three generators have the periods of 2^9937 _ ^ SFMT19937-64 ha s the period 
of a multiple of — 1 (see Proposition 1 in IlSaito and Matsumoto 200^ for details). 

Table [U summarizes the figures of merit TVi, A and timings. In this table, we report 
the CPU time (in seconds) taken to generate 10® 64-bit unsigned integers for each 
generator. The timings were obtained using two 64-bit CPUs: (i) a 3.40 GHz Intel 
Core 17-3770 and (ii) a 2.70 GHz Phenom II X6 1045T. The code was written in C and 
compiled with GCC using the -03 optimization flag on 64-bit Linux operating systems. 
For SFMT19937-64, we measure d the CPU time for the case of sequential generation 
(see IlSaito and Matsumoto 20081 for details). 

MELG19937-64 is maximally equidistributed and also has Ni « p/2. These values 
are the best in this table. In terms of generation speed, MELG19937-64 is comparable 
to or even slightly faster than the MT19937-64 generators on the above two platforms. 
SFMT19937-64 without SIMD is comparable to or faster than MELG19937-64, and 
SFMT19937-64 with SIMD is more than twice as fast as MELG19937-64. However, 
A for SFMT19937-64 is rather large. In fact, the SFMT generators are optimized un¬ 
der the assumption that one will mainly be using 32-bit output sequences, so that 
the dimensions of equidistribution with ti-bit accuracy for 64-bit output sequences are 
worse than those for 32-bit cases (A = 4188). For this, we analyze the structure of 
SFMT19937 in the online Appendix. 

Finally, we convert 64-bit integers into double-precision floating-point numbers 
uo,ui,U 2 ,... in [0,1), and submit them to the stati stical tests included in the Small- 
Crush, Crush, and BigCrush batteries of TestUOl [UEcuyer and Simard 2007 1. Note 
that these batteries have 32-bit resolution and have not yet been tailored to 64-bit 
integers. In our C-implementation, for MELG19937-64 and MT19937-64, we gen¬ 
erate the above uniform real numbers by (x » 11) * (1.0/9007199254740992.0), 
where x is a 64-bit unsigned integer output. For SFMT19937-64, we use a function 
sfint_genrandjres53(), which is obtained by dividing 64-bit unsigned integers x by 
2®^, i.e., X * (1.0/18446744073709551616.0); see the online Appendix for details. In 
any case, we investigate the 32 most significant bits of 64-bit outputs in TestUOl. The 
generators in Table [Upassed all the tests; except for the linear complexity tests (uncon¬ 
ditional failure) and matrix-rank tests (failure only for small p), which measure the F 2 - 
linear dependency of the outputs and reject F 2 -linear generators. This is a limitation 
of F 2 -linear generators. However, for som e m atrix-rank tests, we can observe differ¬ 
ences between MELGs and SFMTs. Table IIIII summarizes the p-values on the matrix- 
rank test of No. 60 of Crush for five initial states. (SFMT1279-64, SFMT2281-64, and 
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Table III. p-Values on the Matrix-Rank Test of No. 60 of Crush in TestUOt 



1st 

2nd 

Srd 

4th 

5th 

MELG1279-64 

0.89 

0.41 

0.11 

0.70 

0.22 

MELG2281-64 

0.70 

0.02 

0.62 

0.49 

0.98 

MELG19937-64 

0.23 

0.13 

0.32 

0.14 

0.85 

SFMT1279-64 

< 10"^““ 

< 10“^““ 

< 10“^'^^^ 

< 10“^'^^^ 

< 10“^'^^^ 

SFMT2281-64 

< 10"^““ 

< 10“^““ 

< 10“^'^^^ 

< 10“^'^^^ 

< 10“^'^^^ 

SFMT19937-64 

0.29 

0.02 

0.06 

0.49 

0.83 


SFMT19937-64 denote the results for double-precision floating-point numbers in [0,1) 
coi iverted from the 64-bit integ er outputs of SFMT1279, SFMT2281, and SFMT19937 
in ISaito and Matsumoto 200 ^ . respectively.) 

Remark 4.1. We occasionally see the use of the least significant bits of pseudoran¬ 
dom numbers in applications. An example is the case in which one generates uniform 
integers from 0 to 15 by taking the bit mask of the 4 least significant bits or modulo 16. 
For this, we invert the order of the bits (i.e., the i-th bit is exchanged with the (w — i)-th 
bit) in each integer and compute the dimension of equidistribution with w-bit accuracy, 
dimension defect at v, and total dimension defect for inversion, which are denoted by 
k'{v), d'{v), and A', respectively. In this case, MELG19937-64 is not maximally equidis¬ 
tributed, but A' = 4047 and d'{v) is 0 or 1 for each v < 11. Note that A' takes values 
9022, 8984, 21341 for MT19937-64, MT19937-64 (ID3), SFMT19937-64, respectively. A' 
of MELG19937-64 is still smaller than A of the other generators in Table [III However, 
as far as possible, we recommend using the most significant bits (e.g., by taking the 
right-shift in the above example), because our generators are optimized preferentially 
from the most significant bits. 

Remark 4.2. For 32-bit generators, there have been some implementations that 
produce 64-bit unsigned integers or 53-bit double-precision floating-point numbers (in 
IEEE 754 format) by concatenating two consecutive 32-bit unsigned integers. We note 
that such conversions might not be preferable from the viewpoint of k{v). As an ex¬ 
ample, consider th e 32-bit M T19937 generator in the header <random> of the C-n-11 
STL in GCC (see lilSO 20121 Chapter 26.5]). Let zo,zi,Z 2 ,... G be a 32-bit un¬ 
signed integer sequence from 32-bit MT19937. To obtain 64-bit unsigned integers, the 
GCC implements a random engine adaptor independent_bit_engine to produce 64-bit 
unsigned integers from the concatenations as 

(zo, Zi), (Z 2 , Zg), (Z 4 , Z 5 ), (zg, Z 7 ), . . . G (15) 

To generate 53-bit double-precision floating-point numbers in [0,1) (i.e., 

uniformjreal_distribution( 0 , 1 ) for MT19937), the GCC implementation gener¬ 
ates 64-bit unsigned integers 

(zi, Zo), (zg, Z 2 ), (Z 5 , Z 4 ), (Z 7 , Zg), .. . G (16) 

by concatenating two consecutive 32-bit integer outputs and divides them by the max¬ 
imum value 2®'^. The sequences ( 1151 ) and ( II 6 I 1 can be viewed as F 2 -linear generators 
with the state transition p in ([S]), so that we can compute k{v). Note that the se¬ 
quences ( I15I I and 0161 ) are different: ( II 6 I I is obtained by exchanging the 32 most sig¬ 
nificant bits for the 32 least significant bits in each 64-bit word in ( I15D . that is, 
the F 2 -linear output functions are described as Si G 5 h> {o{si),o{f{si))) G F^"^ in 
dlSl l and Si G S 1 -^ {o{f{si)),o{si)) G F^"^ in dlHl ). In fact, their A:(w)’s are different 
for V = 33,..., 64. As a result, we have A = 13543 for v = 1,...,64 in ( 1151 ) and 
A = 13161 for V = 1, ..., 52 in ( 1161 ). which are worse than A of MT19937-64. In par¬ 
ticular, fc(12) = 623 < [19937/12J = 1661 for each case. For this reason, we feel that 
there is a need to design 64-bit high-quality pseudorandom number generators. 
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5. CONCLUSIONS 

In this article, we have designed 64-hit maximally equidistributed F 2 -linear genera¬ 
tors with Mersenne prime period and searched for specific parameters with period 
lengths from 2®°^ — 1 to — 1. The key techniques are (i) state transitions with dou¬ 
ble feedbacks and (ii) linear output transformations with several memory references. 
As a result, the generation speed is still competitive with 64-bit Mersenne Twisters 
on some platforms. The code in C is available at https://github.com/sharase/melg-64 
Pseudorandom number generation is a trade-off between speed and quality. Our gen¬ 
erators offer both high performance and computational efficiency. 
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